A Database Index to Large Biological Sequences

نویسندگان

  • Ela Hunt
  • Malcolm P. Atkinson
  • Robert W. Irving
چکیده

We present an approach to searching genetic DNA sequences using an adaptation of the sufx tree data structure deployed on the general purpose persistent Java platform, PJama. Our implementation technique is novel, in that it allows us to build su x trees on disk for arbitrarily large sequences, for instance for the longest human chromosome consisting of 263 million letters. We propose to use such indexes as an alternative to the current practice of serial scanning. We describe our tree creation algorithm, analyse the performance of our index, and discuss the interplay of the data structure with object store architectures. Early measurements are presented.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

The Investigation of Mutations and Comparison of Leptin Gene Pro-Motor in Najdi Cattle with the Database NCBI Sequences

Objective: Identity the genetic aspects and major gene influence on energy balance, milk production, fertility, food safety and consumer are the recent interests of genetic and breeding researchers. Methods: Najdi Cattle is the most prominent breeds in Khuzestan province. To do this plan in Shoushtar Najdi Cattle Station, blood samples were taken from 15 Najdi Cattles. DNA was extracted from wh...

متن کامل

Protein Sequence Similarity Search Technique Suitable for Parallel Implementation

Having entered the post genomic era, there lies a plethora of information, both genomic and proteomic. This provides quite a lot of resources so that the computational and machine learning strategies be applied to address the problems of biological relevance. Searching in biological databases for similar or homologous sequences is a fundamental step for many bioinformatics tasks. On discovery o...

متن کامل

Protein Sequence Similarity Search Suitable for Parallel Implementation

Having entered the post genomic era, there lies a plethora of information, both genomic and proteomic. This provides quite a lot of resources so that the computational and machine learning strategies be applied to address the problems of biological relevance. Searching in biological databases for similar or homologous sequences is a fundamental step for many bioinformatics tasks. On discovery o...

متن کامل

The Investigation of Mutations and Comparison of Leptin Gene Pro-Motor in Najdi Cattle with the Database NCBI Sequences

Objective: Identity the genetic aspects and major gene influence on energy balance, milk production, fertility, food safety and consumer are the recent interests of genetic and breeding researchers. Methods: Najdi Cattle is the most prominent breeds in Khuzestan province. To do this plan in Shoushtar Najdi Cattle Station, blood samples were taken from 15 Najdi Cattles. DNA was extracted from wh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001